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Phasing by molecular replacement remains difficult for targets 
that are far from the search model or in situations where the 
crystal diffracts only weakly or to low resolution. Here, the 
process of determining and refining the structure of Cglll09, 
a putative succinyl-diaminopimelate desuccinylase from 
Corynebacterium glutamicum, at ~3 A resolution is 
described using a combination of homology modeling with 
MODELLER, molecular-replacement phasing with Phaser, 
deformable elastic network (DEN) refinement and automated 
model building using AutoBuild in a semi-automated fashion, 
followed by final refinement cycles with phenix. refine and 
Coot. This difficult molecular-replacement case illustrates the 
power of including DEN restraints derived from a starting 
model to guide the movements of the model during refinement. 
The resulting improved model phases provide better starting 
points for automated model building and produce more 
significant difference peaks in anomalous difference Fourier 
maps to locate anomalous scatterers than does standard 
refinement. This example also illustrates a current limitation 
of automated procedures that require manual adjustment of 
local sequence misalignments between the homology model 
and the target sequence. 
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1. Introduction 

Successful molecular-replacement phasing depends on a 
number of factors such as the proximity of the search model 
to the true structure, the quality and completeness of the 
diffraction data (especially at lower resolution), the solvent 
content, the presence of noncrystallographic symmetry and 
the limiting resolution (d min ) of the crystals. Although recent 
advances in reciprocal-space refinement such as deformable 
elastic network (DEN) refinement (Schroder et at, 2010), 
jelly-body refinement (Murshudov et al, 2011) and real-space 
refinement (DiMaio et al, 2011) enable structure determina- 
tion from more distant models, the ultimate success of mole- 
cular replacement phasing depends on whether previously 
unknown parts of the model become visible in the electron- 
density maps or whether conformational changes in the 
structure are uniquely determined. 

DEN refinement consists of torsion-angle refinement 
interspersed with B-factor refinement in the presence of a 
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sparse set of distance restraints (typically one per atom, 
randomly selected) which are initially obtained from a 
reference model (Schroder et al, 2010). The reference model 
can simply be the starting model for refinement or it can be a 
homology or predicted model that provides external infor- 
mation. During the process of torsion-angle refinement with a 
slow-cooling simulated-annealing schema, the DEN distance 
restraints are adjusted in order to fit the diffraction data. The 
degree of this adjustment or deformation of the initial distance 
restraints is controlled by a parameter y. The method of 
jelly-body refinement (Murshudov et al, 2011) bears some 
resemblance to the special case of DEN refinement with y=\. 
The weight of the DEN distance restraints is controlled by 
another parameter, w DEN . A two-dimensional grid search for 
(/, Wden) is performed in which multiple refinements for each 
parameter pair are performed with different initial random- 
number seeds for the velocity assignments of the torsion-angle 
molecular-dynamics method and different randomly selected 
DEN distance restraints. The globally optimal model (in terms 
of minimal i?f ree , possibly assisted by geometric validation 
criteria) is then used for further refinement and model 
building. By default, the last two macrocycles of the DEN 
refinement protocol are performed without any DEN 
restraints, so the resulting model is not strained or biased by 
the reference model (although such restraints can be useful at 
very low resolution). In other words, the DEN restraints guide 
the refinement path, increasing the chances of obtaining a 
better model than with standard refinement. In addition, the 
deformability of the DEN restraints makes this method more 
general than rigid-body or normal-mode refinement. Thus, 
DEN refinement is a general refinement method that can be 
applied to any starting model and reference model. In 
practice, the reference model is likely to be identical to the 
starting model. However, there are situations in which the 
reference model can be different from the starting model. For 
example, re-refinements of existing structures can be 
performed using structures of homologous proteins that were 
not available at the time the original structure was 
determined. 

A number of highly automated procedures for model 
building and model rebuilding have recently been developed 
(Levitt, 2001; Oldfield, 2002, 2003; Ioerger & Sacchettini, 
2003; DePristo et al, 2005; Cowtan, 2006; Langer et al, 2008; 
Terwilliger et al, 2008). A key feature of several of these 
procedures is alternation between model building and calcu- 
lation of electron-density maps. Each local improvement in 
the model leads to an overall improvement in the map, which 
in turn makes additional improvements in the model possible. 
In this work, we use one of these procedures, the AutoBuild 
method (Terwilliger et al, 2008) as implemented in PHENIX 
(Adams et al, 2010), as a core tool for model improvement. 
In one cycle of model rebuilding with AutoBuild, a density- 
modified electron-density map is calculated beginning with 
phases from the working model and including any available 
experimental phase information. A new model is then built 
and refined with phenix. refine (Afonine et al, 2005). Two 
methods for rebuilding the working model are used here. In 



the first method, several new models (or segments) are built 
without reference to the working model. The parts of the new 
models and the working model that best fit the electron- 
density map are then merged together to form a composite 
model. Using this procedure, the model can change in any way 
during rebuilding. In the second method, termed 'rebuilding 
in place', segments of the working model are rebuilt one at a 
time, maintaining connectivity and sequence alignment. This 
'rebuilding-in-place' procedure therefore adjusts the position 
of existing atoms in the structure and can be thought of as an 
extension of refinement. 

In this paper, we describe the process of determining the 
crystal structure of Cglll09 (Joint Center for Structural 
Genomics target 376512 listed in TargetDB; http:// 
targetdb.sbkb.org/TargetDB/), a putative succinyl-diamino- 
pimelate desuccinylase from Corynebacterium glutamicum, 
using a combination of molecular-replacement phasing, 
refinement and semi-automated model building. At the later 
stages, experimental phase information from SeMet MAD 
phasing was included in the refinement. It should be noted 
that these MAD phases were of insufficient quality to allow 
automated model building, and manual building would have 
been exceedingly difficult and time-consuming even for a 
highly skilled crystallographer (see §3.6). Thus, molecular- 
replacement phasing was attempted. However, manual inter- 
pretation of the initial electron-density map again proved 
difficult. Indeed, Cglll09 was one of the cases used to test 
the performance of real-space refinement of the molecular- 
replacement solution in conjunction with the Rosetta empirical 
energy function (DiMaio et al, 2011; case 10 in Table 1 in this 
reference), but the refinement was not completed owing to 
poor or disordered density in numerous regions and low 
resolution (R bee = 0.39; Table 1 in DiMaio et al, 2011). 

Here, we present an independent structure determination 
of Cglll09 at ~3 A resolution without use of the previous 
Rosetta model and molecular-replacement solution. A 
homology model of CglU09 was created using sequence 
alignment with PROMALS3D (Pei et al, 2008) and modeling 
with MODELLER (Sali & Blundell, 1993) starting from the 
structure of succinyl-diaminopimelate desuccinylase from the 
/S-proteobacterium Neisseria meningitidis (PDB entry lvgy; 
Badger et al, 2005). The structure was determined by mole- 
cular replacement with Phaser (McCoy et al, 2007) using a 
model edited with Sculptor (Bunkoczi & Read, 2011), 
followed by DEN refinement with a full (y, w DEN ) grid search 
(Schroder et al, 2010), automated model building with Auto- 
Build, determination of the selenium sites by anomalous 
difference Fourier maps, calculation of MAD phase prob- 
ability distributions using a maximum-likelihood method 
(Burling et al, 1996) and completion of the refinement in a 
semi-automated fashion using AutoBuild and phenix.refine 
(Adams et al, 2010) with the MLHL target function (Pannu et 
al, 1998). The final model has excellent geometry and i? cryst 
and R bee values of 0.238 and 0.257, respectively, at 2.97 A 
resolution. 

This example shows that DEN refinement with a full 
(y, % EN ) grid search generally produces models that are 
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closer to the true structure than standard (gradient-descent) 
or simulated-annealing refinement methods, resulting in 
improved model phases and better R values. The improved 
model phases in turn provide better starting points for auto- 
mated model building with AutoBuild. This approach ulti- 
mately produced a well refined structure that would have been 
very difficult to achieve with manual model building and 
standard refinement. Moreover, the improved model phases 
produce more significant difference peaks that better locate 
the anomalous diffraction selenium sites. Compared with the 
Rosetta refinement method (DiMaio et al, 2011), DEN 
refinement has the advantage that it does not require exten- 
sive empirical energy-function simulations and that it has been 
shown to also work well for structures determined at low 
resolution (worse than 3.5 A). The successful application to 
Cglll09 demonstrates that DEN refinement also has signifi- 
cant utility for structures determined at ~3 A resolution, 
especially for cases of anisotropic diffraction and/or high B 
factors. The research performed in this paper also serves as a 
tutorial for the combined use of various methods and 
computer software systems to tackle difficult molecular- 
replacement cases. The corresponding data files have been 
made available on the CNS website in the tutorial section for 
DEN refinement. 



2. Materials and methods 

2.1. Crystallization 

Cglll09 was expressed, purified and crystallized using the 
JCSG high-throughput structural biology pipeline (Elsliger et 
al, 2010) and standard JCSG protocols with crystallization 
modifications. Briefly, clones were generated using the Poly- 
merase Incomplete Primer Extension (PIPE) cloning method 
(Klock et al, 2008). The gene encoding Cglll09 (GenBank 
NP_600337, gi|19552335; UniProt Q59284) was PCR-amplified 
from C. glutamicum 534 genomic DNA using PfuTurbo DNA 
polymerase (Stratagene) and I-PIPE primers (forward primer, 
5'-ctgtacttccagggcCTGTACTTCCAGGGCATGAACTCTG- 
AACTCAAACCAGGATTAG-3'; reverse primer, 5'-aattaa- 
gtcgcgttaAATTAAGTCGCGTTACTCGCTCAGGTACTG- 
CTTCAAAATTGC-3'; target sequence in upper case) that 
included sequences for the predicted 5' and 3' ends. The 
genomic DNA used here and obtained from the American 
Type Culture Collection (ATCC) contained two amino-acid 
substitutions (Glu4Asn and Lys6Gln) and one amino-acid 
deletion (Leu5), as confirmed by DNA sequencing, when 
compared with the available GenBank sequence from 
C. glutamicum 534; these mutations are unlikely to affect the 
biochemical properties of the enzyme based on their locations. 
Expression was performed in selenomethionine-containing 
medium at 298 K. Selenomethionine was incorporated via 
inhibition of methionine biosynthesis (Van Duyne et al, 1993), 
which does not require a methionine-auxotrophic strain. The 
protein was purified by two steps of nickel-chelating chro- 
matography (GE Healthcare) with an intermediate step 
involving TEV protease cleavage of the purification tag and 



was concentrated to 18.5 mg ml -1 by centrifugal ultrafiltration 
(Millipore) for crystallization trials. Crystals used for structure 
determination were grown using Microseed Matrix Screening 
(MMS; Ireton & Stoddard, 2004; D'Arcy et al, 2007) as 
implemented with an Oryx8 crystallization robot (Douglas 
Instruments). Initial seed crystals used for MMS were grown 
using the nanodroplet vapor-diffusion method from sitting 
drops composed of 200 nl protein solution mixed with 200 nl 
crystallization solution equilibrated against a 50 ul reservoir at 
293 K for 48 days prior to harvest. The crystals used for the 
seed stock were obtained using a precipitating reagent 
consisting of 0.2 M MgCl 2 , 30% PEG 400, 0.1 M HEPES pH 
7.5. The entire crystallization drop (400 nl) containing the seed 
crystals was aspirated using a pipette and placed in a Seed 
Bead tube (Hampton Research) stored on ice. To ensure that 
all crystals were transferred to the Seed Bead tube, the empty 
shelf was rinsed with 50 ul mother liquor. The Seed Bead tube 
containing the seed stock was vortexed vigorously for three 
intervals of 30 s, keeping the tube on ice between each vortex. 
Final MMS crystallization plates were set up on the Oryx8 as 
sitting drops composed of 150 nl protein, 100 nl crystallization 
solution and 50 nl seed stock. The final crystals used for 
structure determination were obtained from a crystallization 
reagent consisting of 43.1% polyethylene glycol 400, 0.2 M 
sodium chloride, 0.1 M sodium/potassium phosphate pH 6.41 
at 293 K for 21 d prior to harvest. 6 mM ZnCl 2 was added 
to the protein prior to setup. No additional cryoprotectant 
was added to the crystal. Initial screening for diffraction was 
carried out using the Stanford Automated Mounting system 
(SAM; Cohen et al, 2002) at the Stanford Synchrotron 
Radiation Lightsource (SSRL; Menlo Park, California, 
USA). 

2.2. X-ray data collection, processing, structure validation 
and deposition 

MAD data were collected on beamline 9-2 at the SSRL at 
wavelengths corresponding to the high-energy remote (A^), 
inflection point (A. 2 ) and peak (A 3 ) wavelengths of a selenium 
MAD experiment using the Blu-Ice (McPhillips et al, 2002) 
data-collection environment. The data sets were collected at 
100 K using a MAR Mosaic 325 CCD detector (Rayonix, 
USA). The MAD data were integrated and reduced using 
XDS (Kabsch, 2010) and scaled with XSCALE (Kabsch, 
2010). Diffraction data and refinement statistics are summar- 
ized in Table 1. The quality of the crystal structure was 
analyzed using the JCSG Quality Control server (http:// 
smb. slac.stanford.edu/jcsg/QC), which verifies the stereo- 
chemical quality of the model using AutoDepInputTool (Yang 
et al, 2004), MolProbity (Chen et al, 2010) and WHAT IF v.5.0 
(Vriend, 1990); the agreement between the atomic model and 
the data using SFCHECK v.4.0 (Vaguine et al, 1999) and 
RESOLVE (Terwilliger, 2000); the protein sequence using 
ClustalW (Thompson et al, 1994); atom occupancies using 
MOLEMAN2 (Kleywegt, 2000) and the consistency of NCS 
pairs; and evaluates R ile JR a y S t and the maximum/minimum 
B factors. Atomic coordinates and experimental data for 
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Table 1 

Crystallographic data and refinement statistics for Cglll09. 



Values in parentheses are for the highest resolution shell. 




X 1 MAD-Se 


X 2 MAD-Se 


X 3 MAD-Se 




(remote) 


(inflection point) 


(peak) 


Space group 


P6 5 22 






Unit-cell parameters (A) 


a = 82.90, b = 82.90, c = 


364.18 




Data collection 








Wavelength (A) 


0.9116 


0.9794 


0.9792 


Resolution range (A) 


29.5-2.97 (3.05-2.97) 


29.5-3.f7 (3.26-317) 


29.5-2.97 (3.05-2.97) 


No. of observations 


73623 


60577 


111259 


No. of unique reflections 


16179 


13404 


16192 


Completeness (%) 


99.1 (98.5) 


99.1 (99.0) 


99.2 (98.4) 


Mean Ila(l) 


13.1 (1.5) 


14.9 (2.8) 


17.2 (1.7) 


Emerge On 7f (%) 


9.5 (124.8) 


9.1 (64.7) 


10.6 (150.6) 


tfmcas On It (%) 


10.1 (140.6) 


10.4 (72.9) 


11.4 (162.5) 


Model and refinement statistics 








Resolution range (A) 


29.5-2.97 






No. of reflections (total) 


16098§ 






No. of reflections (test set) 


1649 






Completeness (%) 


99.07 






Data set used in refinement 


X.! MAD-Se 






Cutoff criterion 


|F|>0 








0.238 








0.257 






Stereochemical parameters 








Restraints (r.m.s.d. observed) 








Bond angles (°) 


0.625 






Bond lengths (A) 


0.003 






Average protein isotropic 


99.7tt 






B factor (A 2 ) 








Maximum-likelihood-based 


0.71 






coordinate error (A) 








Protein residues 


360 






Phosphates/chlorides 


1/1 







3. Results and discussion 

3.1. Search for similar structures, 
primary-sequence alignment and 
homology modeling 



1* ^ merge 

dent R, 



Emi \Ii(ltk!) - (I(hkt)) \IT, htl J2i Uhkl) (Diederichs & Karplus, 1997) * fl meas (redundancy-indepen- 
m„ge) = J2 h JN(hkl)/[N(hkr> - 1]F 2 Y.i \h(hkl) - (I{hkt))\/T.m Ei h(hkl)- § Typically, the number of unique 
reflections used in refinement is slightly less than the total number that were integrated and scaled. Reflections are 
excluded owing to negative intensities and rounding errors in the resolution limits and unit-cell parameters. *[ i? cryst = 
Em; ll^obsl — l-^caicl l/Ewt/ l^obsl> where F calc and F obs are the calculated and observed structure-factor amplitudes, 
respectively. i? free is the same as i? trysl , but calculated using 10.24% of the total reflections that were chosen at random and 
omitted from refinement, ft This value represents the total B, which includes overall TLS refinement and residual B 
components. 



Cglll09 from C. glutamicum to 2.97 A resolution (PDB entry 
3tx8) have been deposited in the Protein Data Bank (http:// 
www.wwpdb.org). 

2.3. Homology modeling, structure determination and 
refinement 

PROMALS3D (Pei et al, 2008) was used for primary- 
sequence alignment, MODELLER (Sali & Blundell, 1993) 
was used for profile generation and homology modeling, 
Sculptor (Bunkoczi & Read, 2011) and Phaser (McCoy et al, 
2007) were used for molecular-replacement phasing, CNS v.1.3 
was used for DEN refinement (Schroder et al, 2010), Auto- 
Build (Terwilliger et al, 2008) was used for automated model 
building, Coot (Emsley et al, 2010) was used for manual 
rebuilding and structure validation, CNS was used for MAD 
phasing and density modification (Briinger et al, 1998), 
phenix. refine (Adams et al, 2010) was used for final refinement 
cycles and PyMOL (DeLano, 2002) was used for molecular 
illustrations and structure and electron-density map super- 
position. 



A profile of structures related to the 
genomic sequence of Cglll09 (Fig. 1) 
was generated using the MODELLER 
build_prof ile .py script (http:// 
www.salilab.org/modeller/tutorial/basic. 
html) and the current protein database 
file pdb_95.pir (updated 24 February 
2011) available in the supplementary 
file download section of the 
MODELLER website. This produced a 
list of eight homologous structures 
(PDB entries lcg2, 3ct9, 2f7v, 3gb0, 3isz, 
3pfo, 2rb7 and lvgy) with sequence 
identities that varied between 24 
and 28%. A cluster analysis of these 
structures using the MODELLER 
compare. py script revealed that they 
are all relatively equidistant from each 
other, with the exception of PDB 
entries 3isz and lvgy, which are closer to 
each other than to the other structures. 
Since there is no significant difference 
in terms of sequence identity to the 
target structure among these candidate 
models, the one with the highest reso- 
lution and best i? free value was chosen 
for all further calculations (PDB entry 
lvgy chain A, referred to as lvgy-A in 
the following), which was also the 
template used for ivosefta-based mole- 
cular replacement (DiMaio et al, 2011). 
The success of molecular replacement depends on optimal 
sequence alignment between homologous structure and target 
sequence (Schwarzenbacher et al, 2004; Bunkoczi & Read, 
2011). To make some use of the structural information in 
the primary-sequence alignment we used the PROMALS3D 
program (Pei et al, 2008), resulting in the alignment shown in 
Fig. 1. PROMALS3D can produce more accurate sequence 
alignments compared with methods that do not make use of 
secondary-structure information for sequence pairs with at 
least 20% identity (Pei et al, 2008). Other methods such as 
HHpred (Soding, 2005) that include secondary-structure 
information may provide alternative alignments (see §4). 

The primary-sequence alignment obtained with 
PROMALS3D and the structure of lvgy -A were used as input 
for the generation of a homology model using the 
model-single. py script of MODELLER. All default para- 
meters were used except that the a.very_fast() option was 
specified to perform a limited amount of target-function 
optimization with conjugate-gradient minimization. This 
limited amount of energy minimization keeps the resulting 
homology model closer to the crystal structure of lvgy-A, 
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which may be a benefit since lvgy-A itself produces a mole- 
cular-replacement solution (see §3.2). In general, it might be 
beneficial to try this fast optimization method as well as 
models generated by MODELLER with more extensive 
optimization and then to judge the models according to the 
molecular-replacement score. 

3.2. Molecular-replacement phasing 

Molecular-replacement phasing using Phaser (McCoy et al, 
2007) was performed with two different search models: the 
lvgy-A crystal structure and the homology model of Cglll09 



obtained by MODELLER. The original B factors were used 
for the lvgy-A search model. The diffraction data for the 
Cglll09 crystal structure were quite anisotropic and the 
effective overall B factors along the principal axes of the unit 
cell ranged from 60 to 110 A 2 . The relatively high anisotropy 
and high B factors made the structure determination consid- 
erably more challenging than for many other structures at a 
similar resolution of about 3 A. After clustering of the 
rotation-function and translation-function peaks and the 
purging of peaks below a 75 % threshold (the default settings 
in Phaser), a single solution emerged with RFZ = 3.2, 
TFZ = 9.7, LLG = 65, i? cryst = 0.65 and six clashes. 



Conservation: 99 99 9 9 9999 999 9 

lvgy_A 1 TETQSLELAKELI SRPSVTPDDRDCQKLMAERLHKIGFAAEEMHFGNTKNIWLRRGTK-APVVCFAGHTD 69 

Cglll09 1 -LGDPIVLTQRLVDIPSPSGQEKQIADEIEDALRNLNLPGVEVFRF-NNNVLARTNRGLASRVMLAGHID 68 
Consensus_aa : . bsps 1 . LhpcLls . PSsosp — l-ph . chh . - . L+p Jshst . Eh(? . . . spNlhhRpsp . .As . VhhAGHhD 

Consensus ss: hhhhhhhhhhhhh hhhhhhhhhhhhhh eeeeeee eeeeeee eeeeeee 



Conservation: 99 99 99 9 999 9 9 99 

lvgy_A 70 VVPTGPVEKWDSPPFEPAERDGRLYGRGAADMKTS IACFVTACERFVAKHPNHQGS I ALLI TSDEEGDAL 139 

Cglll09 69 TVPIA DNLPSRVEDGIMYGCGTVDMKSGLAVYLHTFATLATST-ELKHDLTLI AYECEEVADH 130 

Consensus_aa : hVPht ssh.s . . cDG. AYG. GftiiDMKot lAhQlhhh .phhhph . piip . slhLlAhpsEEsssh 

Consensus ss : eeeee eeee hhhhhhhhhhhhhhhh eeeeeee 



Conservation: 9 9999 9999 99 99 999 

lvgy_A 140 DGTTKVVDVLKARDELI DYC I VGEPTAVDKLGDMIKNGRRGSLSGNLTVKGKQGHIAYPHLAINPVHTFA 209 

Cglll09 131 LNGLGHIRDEHPEWLAADLALLGEPTG GWIEAGCQGNLRIKVTAHGVRAHSARSWLGDNAMHKLS 195 

Consensus_aa : . ssii . hlcsb+sc .bhhDhtl JGEPTt shlcsG. pGsLp . plTh+G . ptH . A. s(?Lt .NshHpAt 

Consensus ss: hhhhhhhhhhh eeee eeeeeee eeeeeeeeeeee hhhhhh 



Conservation: 9 9 9 9 9 9 9999 9999 9 9 

lvgy_A 210 PALLELTQEVWDEGNE — YFPPTSFQI SNINGGTGATNVIPGELNVKFNFRFSTESTEAGLKQRVHAILD 277 

Cglll09 196 PI I SKVAAYKAAEVNI DGLTYREGLNIVFCESGV-ANNVI PDLAWMNLNFRFAPNRDLNEAIEHVVETLE 264 
Consensus_aa : Phi . c lh . b . iisEsNb . .hh. . pthpls . hptGh . AsNVIPsbh . hphNFRFtsppsbs . hbp+Vh . hL- 

Consensus ss: hhhhhhhhhhhhh hh eeeeeeee eee eeeeeeeeee hhhhhhhhhhhhhhhh 



Conservation: 9 99 99 99 99 9 

lvgy_A 278 KHG-VQYDLQWSCSGQPFLTQAGKLTDVARAAI AETCGIEAELSTTGGTSDGRFIKAMAQELIELGPSNA 346 

Cglll09 265 LDGQDGIEWAVEDGAGGALPG LGQQVTSGLI DAVGREKIRA-KFGWTDVSRFS AMGIPALNFGAGDP 330 

Consensus_aa : bcG. s , h-h . hpstt . shLs .... Lsp . hpttlh-hhG . E . b . t . p . GftoDsp . /2pAMtb . hlpiiGstss 

Consensus ss : h eeeeeeee hhhhhhhhhhhhhhh eeeee hhhhhhhh eeeeee 



Conservation: 9 9 9 9 9 

lvgy_A 347 -TIHQINENVRLNDI PKLSAVYEGILVRLL 375 

Cglll09 331 SFAHKRDEQCPVEQITDVAAILKQYLSE — 358 
Consensus_aa : . hhHp . sEph . IppIscltAJiic . ALsc . . 

Consensus_ss : hhhhhhhhhhhhhhhhhh 

Figure 1 

Primary-sequence alignment between lvgy (chain A) and Cglll09. The alignment obtained by PROMALS3D (Pei et al., 2008) is shown. The first line in 
each block shows conservation indices for positions with a conservation index above 4. The last two lines show consensus amino-acid sequence 
(Consensus_aa) and consensus predicted secondary structure (Consensus_ss). The representative sequences are named in magenta and are colored 
according to predicted secondary structure (red, a-helix; blue, /i-strand). The first and last residue numbers of each sequence in each alignment block are 
shown before and after the sequences, respectively. Consensus-predicted secondary-structure symbols: a-helix, h; /i-strand, e. Consensus amino-acid 
symbols are as follows (conserved amino acids are shown in bold uppercase letters); aliphatic (I, V, L), aromatic (Y, H, W, F), @; hydrophobic (W, F, Y, 
M, L, I, V, A, C, T, H), h; alcohol (S, T), o; polar residues (D, E, H, K, N, Q, R, S, T), p; tiny (A, G, C, S), t; small (A, G, C, S, V, N, D, T, P), s; bulky residues 
(E, F, I, K, L, M, Q, R, W, Y), b; positively charged (K, R, H), +; negatively charged (D, E), — ; charged (D, E, K, R, H), c. Note that the sequence numbers 
refer to the genomic sequence of Cglll09 (taking into account the minor mutations in the construct used for crystallization; see text) and lvgy. The 
residue numbering in the deposited PDB file (PDB entry 3tx8) begins with the first residue of the expression construct used, so it is offset by 11 residues 
compared with the genomic sequence. 
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The MODELLER search model (with B factors set to a 
uniform value of 50 A 2 ) was first edited using Sculptor 
(Bunkoczi & Read, 2011) with the PROMALS3D alignment 
(Fig. 1) in order to trim surface side chains (as suggested by 
Schwarzenbacher et al, 2004) and to modify the B factors of 
the search model according to sequence similarity between 
Cglll09 and lvgy-A (the similarity score was used for the 
5-factor modeling and the Schwarzenbacher score was used 
for the pruning). After clustering of the rotation-function and 
translation-function peaks and purging peaks below a 75% 
threshold (default settings in Phaser), a single solution 
emerged with RFZ = 3.2, TFZ = 9.9, LLG = 75, R ayst = 0.65 
and 11 clashes. The position and orientation of this solution 
was very similar to that obtained with molecular replacement 
using the lvgy-A search model, lending credence to the 
correctness of the solution. Furthermore, the solution was 
determined to be identical to that found by molecular 
replacement with the Rosetta search model (DiMaio et al, 
2011) apart from application of symmetry and lattice opera- 
tors. However, Phaser was unable to produce the correct 
solution when using a fully optimized model obtained with 
the default settings in MODELLER [as opposed to the 
minimal a.very_fast() setting]; inspection of the optimized 
MODELLER model revealed that it had significantly moved 




Figure 2 

Interaction between symmetry-related molecules. A primary molecule 
(orange) and the nearest symmetry-related molecules (blue) obtained by 
applying the symmetry operators of the space group of the crystal (P6 5 22) 
to the primary molecule are shown, as well as lattice translations. Taken 
together, all these molecules form a network of interactions which is 
connected throughout the crystal in all three dimensions. The molecules 
interact through three interfaces, labelled 1, 2 and 3. Interface 2' is related 
by crystallographic symmetry to interface 2. Of the three interfaces, 
interface 1 involves the most extensive interactions, with a buried suface 
area of 1569 A 2 (compared with 541 A 2 for interface 2 and 276 A 2 for 
interface 3; the buried surface areas were computed with the PDBePISA 
server). Considering the extensive interactions, interface 1 is likely to 
promote dimerization of the molecule, as is also suggested by the 
PDBePISA server. 
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away from the lvgy-A template and thus was apparently 
too distant from the true structure of Cglll09 to produce a 
molecular-replacement solution. This example shows that it is 
useful to try different homology models and to score them 
according to the criteria provided by the particular molecular- 
replacement method used, e.g. rotation-function and 
translation-function Z scores and log-likelihood gain in 
Phaser. In general, it is advisable to try additional searches in 
which the search model is broken up into subdomains that 
may exhibit different relative orientations and translations. 
However, this was unnecessary for Cglll09 as the subdomain 
placements were very similar between Cglll09 and lvgy-A 
(see below). 

A further validation of a molecular-replacement solution 
is provided by the overall crystal-packing arrangement and 
connectivity of the arrangement, i.e. no empty spaces should 
be left between the layers of molecules. Fig. 2 illustrates the 
connectivity of the arrangement and the three different 
interfaces that are created by symmetry and lattice operators. 

3.3. DEN refinement 

DEN refinement generally requires a starting model that 
matches the primary sequence of the target structure. There - 




0.0 0.2 0.4 0.6 0.8 1.0 
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Figure 3 

DEN refinement starting from molecular-replacement solution. The best 
i? free value for each parameter pair (y, w DEN ) among 20 repeats is shown; 
for each parameter pair we performed 20 repeats of the DEN-refinement 
protocol consisting of ten macrocycles of torsion-angle refinement and 
restrained individual S-factor refinement (for details, see text). The i? free 
value is contoured using values calculated on a 6 x 6 grid (marked by 
small + signs) where the parameter y is (0.0, 0.2, 0.4, 0.6, 0.8, 1.0) and 
w DEN is (0, 3, 10, 30, 100, 300); the results for vc DEN = 0 (i.e. torsion-angle 
refinement without DEN restraints) are independent of y and the same 
value was used for all grid points with vf DEN = 0. The value of i? free varies 
from 0.444 to 0.479. The contour plot shows two pronounced minima in 
the range 300 > w> DEN > 100, with the absolute minimum at vf DEN = 300, 
y = 0.2. 



Acta Cryst. (2012). D68, 391-403 



research papers 



fore, the molecular-replacement solution obtained from the 
minimally optimized MODELLER model was used as the 
starting point for DEN refinement. Side chains that were 
pruned by Sculptor were added back to the model by super- 
imposing the complete model obtained by MODELLER on 
the Phaser molecular-replacement solution. All B factors were 
reset to a uniform value (50 A 2 ). The resulting coordinates 





fe) W 

Figure 4 

Comparison of various refinements and maps for residues 66-77. The sequence numbers refer to the 
genomic sequence of Cglll09 (see Fig. 1). (a) Standard refinement (gray) versus the final model 
(orange), (b) Standard refinement and one round of AutoBuild (blue) versus the final model 
(orange), (c) DEN refinement (green) versus the final model (orange sticks), (d) DEN refinement 
and one round of AutoBuild (magenta) versus the result of semi-automated rebuilding (yellow) 
versus the the final model (orange), (e) 2mF 0 — DF C electron-density map after standard refinement 
(blue mesh) and a subsequent round of AutoBuild (cyan mesh) versus the final structure (orange 
sticks). (J) 2mF 0 — DF C electron-density map after DEN refinement (blue) and a subsequent round 
of AutoBuild (cyan) versus the final structure (orange sticks), (g) Electron-density map obtained by 
density modification of the MAD map (blue) versus the final structure (orange sticks), (h) 
2mF 0 — DF C electron-density map (blue mesh) of the final model (orange sticks). 



were used as both the starting and reference model for DEN 
refinement (Schroder et at, 2010). The refinement protocol 
was similar to that used in previous work (Schroder et al, 2010; 
as also described in the tutorial for DEN refinement in 
CNS v.1.3; http://cns-online.Org/vl.3/) except that isotropic 
restrained individual 5-factor refinement was carried out 
instead of restrained group B-factor refinement as appropriate 
for the resolution of Cglll09. Specifi- 
cally, ten macrocycles of torsion-angle 
refinement and restrained individual B- 
factor refinement were performed in 
which the first cycle always used y = 0, 
the following seven cycles used a 
specified value for y (see below) and 
the last two cycles were performed 
without DEN restraints. The MLF 
target function (Pannu & Read, 1996) 
was used for the refinement against the 
diffraction data at the inflection point 
(the same diffraction data that were 
used in the work by DiMaio et al, 2011). 
In the final stages of refinement, the 
diffraction data at the high-energy 
remote wavelength were used (see 
below). 

DEN distance restraints were gener- 
ated from N randomly selected pairs of 
atoms in the reference model that were 
separated by not more than ten residues 
along the polypeptide sequence and 
were separated by 3-15 A in space 
(default settings for DEN refinement in 
CNS). The value of TV was chosen to be 
equal to the number of atoms, so the 
set of distance restraints was relatively 
sparse, with an average of one restraint 
per atom. 

We determined the optimum values 
of the y and w DEN parameters of DEN 
refinement by a global two-dimensional 
grid search (Fig. 3). At each grid point, 
20 refinement repeats were performed 
with different random initial velocities 
and different randomly selected DEN 
distances. We used 30 combinations of 
six y values (0.0, 0.2, 0.4, 0.6, 0.8 and 
1.0) and five w DEN values (3, 10, 30, 100 
and 300); we also included 20 repeats 
with w DEN = 0 (corresponding to using 
the refinement protocol without DEN 
restraints, with the results being inde- 
pendent of y). Of all the resulting 
models, the one with the lowest i? £ree 
value (0.444; Fig. 3 and Table 1) was 
used for subsequent model building and 
refinement. Generally, if there are 
multiple models with similar low R flee 
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values, one could choose the one with the better geometry. 
The resulting model was substantially better in many places 
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Figure 5 

Comparison of various refinements and maps for residues 251-276. Residues 251-263 comprising an 
a-helix, residues 264-271 comprising a loop and residues 272-276 comprising a jS-strand are shown 
(the sequence numbers refer to the genomic sequence; see Fig. 1). (a) Standard refinement (gray) 
versus the final model (orange). Standard refinement produces fragmented or incorrectly connected 
electron density (marked by arrows), (b) Standard refinement and one round of AutoBuild (blue) 
versus the final model (orange). Electron density is still fragmented or shows incorrect connectivity, 
(c) DEN refinement (green) versus the final model (orange), (d) DEN refinement and one round of 
AutoBuild (magenta) versus the result of semi-automated rebuilding (yellow), (e) 2mF 0 — DF C 
electron-density map after standard refinement (blue mesh) and a subsequent round of AutoBuild 
(cyan mesh) versus the final structure (orange sticks), (f) 2mF 0 — DF C electron-density map after 
DEN refinement (blue) and a subsequent round of AutoBuild (cyan) versus the final structure 
(orange sticks), (g) Electron-density map obtained by density modification of the MAD map (blue) 
versus the final structure (orange sticks), (h) 2mF 0 — DF C electron-density map (blue mesh) of the 
final model (orange sticks). 



than what could be achieved using a standard refinement 
protocol (for a representative example, compare Figs. 4a and 
4b and see below). 



3.4. First round of automated model 
building with AutoBuild 

Starting from the best DEN-refined 
structure, automated model building 
with AutoBuild (Terwilliger et ah, 2008) 
was performed. The default settings for 
rebuilding the model without the 
addition or deletion of residues (the 
rebuild_in_place=true option in 
AutoBuild) were used except that 
'morphing' was enabled and the reso- 
lution for multiple model building was 
set to the limiting resolution of the 
diffraction data at the inflection-point 
wavelength (3.17 A). The morphing 
process in AutoBuild consists of identi- 
fying a coordinate shift to apply to each 
backbone N atom that maximizes the 
local density correlation between the 
model and the map (Terwilliger et ai, 
submitted). These coordinate shifts are 
smoothed and applied to the structure 
to generate a morphed structure. An 
initial map was used for AutoBuild 
consisting of the average of the 
2mF a — DF C electron-density maps 
corresponding to the top 20 models (in 
terms of R ilee ) obtained from DEN 
refinement. Such map averaging can be 
beneficial (Rice et ai, 1998), although in 
this particular case using the average 
map was similar to using the map 
obtained from the top solution. This 
round of automatic model building 
produced further improvements in the 
model (Figs. 4d and 5d) and lowered the 
R values (R bee = 0.418, R c[ysl = 0.327; 
Table 2). 

At this point, it became clear from 
the electron-density maps produced by 
DEN refinement (2mF 0 — DF C map) 
and AutoBuild (both 2mF 0 — DF C and 
density-modified maps) that the model 
contained several incorrect sequence 
registers, resulting in distorted a-helices 
and bulging loops that had no electron 
density associated with them (a striking 
example is shown in Fig. 6). DEN 
refinement and AutoBuild are currently 
unable to automatically correct such 
sequence -register shifts and deformed 
a-helices. In particular, AutoBuild has 
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no facility for automatic adjustment of sequence register or 
missing residues when building with the rebuild_in_ 
place approach. Still, it was possible to correct these errors by 
semi-automated rebuilding and manual model building as 
outlined below. In principle, completely automated rebuilding 
of the model can be performed for structures at 3 A resolution 
(e.g. starting from an experimental electron-density map or a 
density modified map of a molecular-replacement solution), 
but for Cglll09 this approach was not successful, presumably 
owing to the relatively high anisotropy and B values of the 
crystal structure. It should be noted that no experimental 
MAD phase information had been used up to this stage of the 
refinement process, so it is likely that the structure could have 
been completed without experimental phase information 
(Fig. 6). 

3.5. Comparison with standard refinement 

For comparison, we performed 'standard refinement' 
consisting of three macrocycles of 200 steps of positional (xyz) 
minimization and 200 steps of restrained individual 5-factor 
refinement using CNS starting from the same model that was 
used for DEN refinement. One round of automated model 
building starting from this standard refined model was 
performed using the same options for AutoBuild as for the 
DEN-refmed model (see above). 

The R values that were achieved by DEN refinement were 
significantly lower than those obtained by standard refinement 
(e.g. 2? (ree = 0.444 versus 0.517; see Table 2). Moreover, the 
DEN-refmed structure was significantly closer to the final 
model of Cglll09 (representative examples are shown in 
Figs. 4a, 4c, 5a and 5c). Automated model building did not 
significantly improve the model after standard refinement 




Figure 6 

Comparison of various refinements and maps for residues 251-276. A 
close-up view of the loop consisting of residues 264-271, which is also part 
of Fig. 5, is shown. The final model is colored orange (sticks and cartoon 
representation). The structure after the first round of DEN refinement 
and AutoBuild is colored magenta (sticks and cartoon representation) 
and the corresponding 2mF a — DF C electron-density map (with model 
phases calculated from this structure, but without experimental phase 
information, and contoured at 1.4a) is colored marine blue. The electron- 
density map clearly shows that the loop needed to be corrected. 



Table 2 

R values for different refinement stages and, for comparison, for standard 
refinement. 



Structure R^ R CIysi 



Phaser solution — 0.649 

Standard refinement 0.517 0.432 

Standard refinement + AutoBuild 0.483 0.374 

DEN refinement 0.444 0.399 

DEN refinement + AutoBuild 0.418 0.327 

Second DEN refinement (MLHL) 0.397 0.366 

Second DEN refinement + AutoBuild (MLHL) 0.372 0.325 

Final refined 0.257 0.238 



(Figs. 4b and 5b; Table 2), resulting in i? tree = 0.483 compared 
with R [T&& = 0.418 for the DEN-refmed model. This example 
demonstrates that DEN refinement produces significantly 
better models than standard refinement for starting models 
that are far from the true structure, enabling further 
improvements by automated model building with AutoBuild. 
In most places there was reasonable agreement between 
the final model and the 2mF 0 — DF C electron-density maps 
computed after DEN refinement or subsequent automated 
model building (Figs. 4/ and 5f). In contrast, the electron- 
density maps obtained by standard refinement with and 
without subsequent automated model building were frag- 
mented and exhibited incorrect connectivity in several places 
(Figs. 4e and 5e). Thus, structure completion would have been 
very difficult to achieve with manual model building and 
standard refinement. 

3.6. Determination of selenium sites and MAD phasing 

The model obtained from the first round of DEN refine- 
ment and automated model building was used to calculate 
anomalous difference Fourier maps at the peak wavelength 
(A. 3 ). These difference maps produced difference peaks for the 
six selenium sites of the SeMet residues in the protein. The 
positions of these six sites closely matched the positions of the 
Se atoms in the model obtained after DEN refinement and 
automated model building. Fig. 7 shows the standard devia- 
tions from the mean of the map (a) of these six sites and the 
highest noise peak. The standard deviations of the peaks are 
compared with those obtained from standard refinement with 
and without subsequent automated model building. The 
combination of DEN refinement and automated model 
building produced the most significant difference peaks, all of 
which were well separated from noise. Standard refinement 
produced the poorest results, with three of the sites close to 
noise peaks. For both standard refinement and DEN refine- 
ment automated model building with AutoBuild improved the 
significance of the sites, although DEN refinement alone still 
produced more significant peaks for some of the sites than 
standard refinement and automated model building. In 
retrospect, it may have been possible to obtain the positions of 
the six sites by ab initio search, for example by using the HySS 
submodule (Grosse-Kunstleve & Adams, 2003), although 
careful choice of the high-resolution limit is required (trun- 
cation to 4.5 A resolution) since a search against all diffraction 
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data produced only one site that matched one of the six 
selenium sites. 

We next calculated MAD phase probability distributions to 
2.97 A resolution and refined the six selenium sites using a 
maximum-likelihood method (Burling et al, 1996) as 
implemented in CNS (Briinger et al, 1998) using the 
mad_phase . inp task file. The diffraction data collected at the 
three wavelengths were used (Table 1), anisotropic scale 
factors between the three data sets were refined, individual B 
factors for the anomalous sites were refined, occupancies were 
set to 1 and anomalous form factors were constrained to be 
identical for all sites at a particular wavelength. The phasing 
calculations resulted in an overall figure of merit of 0.55 with 
reasonable overall scale factors, B factors and anomalous form 
factors of /' = -6.14 (-6.95), /" = 4.73 (3.15) at the peak, 
/' = -11, /" = 5.27 at the inflection point and /' = -3.32 
(—3.59), /" = 3.66 (1.05) at the remote wavelength, where the 
numbers refer to the results from the Friedel mate F to 
^reference lack-of-closure expressions and the numbers in 
parentheses refer to the F to .Preference lack-of-closure expres- 
sions (Burling et al, 1996). For comparison, the predicted 
values obtained from a fluorescence scan of the crystal are 
/' = -8.65, /" = 6.21 at the peak, /' = -11.11, /" = 3.64 at the 
inflection point and /'= —1.70, /" = 3.30 at the remote wave- 
length. In our experience, the differences between the refined 
values of/' and/" for the two lack-of-closure expressions and 
from the predicted values are not uncommon for SeMet MAD 
data. 

The resulting MAD electron-density map was subjected to 
density modification as implemented in CNS (Briinger et al, 
1998) using the density_modify.inp task file. The default 
settings were used, which include solvent flipping with 
generation of the mask based on root-mean-square electron- 
density fluctuations assuming 70% solvent content. No atomic 

■O - Standard refinement 
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Selenium site No. 

Figure 7 

Significance of selenium sites. The standard deviation above the mean (a) 
in anomalous difference Fourier maps is shown for the six selenium sites 
of the SeMet variant of Cglll09. For comparison, the standard deviation 
of the highest noise peak is also shown. The amplitudes for the calculation 
of the anomalous difference Fourier map were obtained from the 
diffraction data at the peak wavelength (Table 1). The phases were 
obtained from the atomic model after standard refinement (blue 
diamonds), standard refinement followed by automated building with 
AntoBuild (green triangles), DEN refinement (yellow squares) and DEN 
refinement followed by automated model building with AutoBuild (red 
circles). 



model was used for the generation of the mask and no prior 
phase information was used for the refinement of anomalous 
sites in order to avoid model bias. The resulting figure of merit 
was 0.81 and the density-modified MAD electron-density map 
was connected but did not allow unambiguous identification of 
side chains for many residues (Figs. 4g and 5g). Although this 
map may be of sufficient quality such that manual building 
could have been attempted, it would have been challenging at 
this resolution. Indeed, automated model building using the 
same map resulted in a very incomplete model: only 76 side 
chains were fitted out of 360, with several false backbone 
connections. 



3.7. Semi-automated completion of the refinement 

A second round of DEN refinement (using the current 
model obtained from the first round of DEN refinement and 
automated model building as both the starting and the refer- 
ence model) and automated model building was performed 
using the MLHL target function (Pannu et al, 1998) that 
included the experimental MAD phase information, resulting 
in relatively small localized changes in coordinates with some 
more significant corrections of side-chain positions, improve- 
ments in R values and a reduction of the i? (re e — ^ cry st 
difference (Table 2). 

As mentioned above, there were several regions that 
required correction of register shifts and rebuilding of 
a-helices (a particular example is shown in Fig. 6) that were 
not corrected even in the second round of DEN refinement 
and automated model building. To correct these regions, 
selected regions were deleted from the model and another 
round of automated rebuilding with AutoBuild was performed, 
again using the electron-density map from the previous model 
as the initial map, using the experimental MAD phase infor- 
mation and the primary sequence, with morphing enabled and 
the rebuild-in-place option set to false. Interestingly, we found 
that using a 2mF Q — DF C electron-density map as the initial 
electron-density map for AutoBuild produced somewhat 
better results for rebuilding in this particular case than using 
the density-modified map generated by AutoBuild. The 
resulting models (using models with different deletions as 
starting models for automated model building) were inspected 
using Coot (Emsley et al, 2010) and the portions that best 
fitted the electron-density maps were combined to generate a 
hybrid model. Missing loops were fitted with the 'Fit Loops' 
feature of PHENIX. This procedure of selected rebuilding by 
deletion of the problematic regions and automated rebuilding 
was repeated several times. This semi-automated method 
corrected the majority of cases of incorrectly fitted a-helices 
and loops arising from register errors (Figs. Ad and 5d, yellow 
versus orange models). 

The remaining misfitted regions were manually corrected 
with Coot (Emsley et al, 2010) interspersed with refinement 
with phenix.refine (Adams et al, 2010). The final refinement 
(Table 1) employed residues 10-369 of Cglll09 (a 369-residue 
protein) and other solvent molecules (one phosphate ion and 
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one chloride ion). It was performed against diffraction data 
collected at the high-energy remote wavelength (Table 1). 



3.8. Biological implications and comparison between 1vgy 
and Cgl1109 

C. glutamicum is a Gram-positive bacterium that finds 
industrial use in the production of vitamins and amino acids, 
including glutamic acid, which is used in the production of 
the flavoring agent monosodium glutamate. Cglll09 (NCBI 
reference sequence identifier NP_600337; UniProt identifier 
Q59284) is a putative succinyl-diaminopimelate desuccinylase 
(DapE) from C. glutamicum consisting of two domains: a 
peptidase domain belonging to family PF01546 (Peptidase_ 
M20) in clan CL0035 of zinc metallopeptidases (~30 000 
proteins in 12 families) in v.25 of the Pfam database (Finn et 
al., 2010) and a dimerization domain belonging to PF07687 
(M20_dimer). These proteins have a broad phylogenetic 
spread across all kingdoms of life, show substantial sequence 
divergence and are essential for numerous biological 
processes (for example, recombinant bacterial carboxy- 
peptidase G2 is used in cancer therapy to hydrolyze metho- 
trexate and is being tested in prodrug therapy, and human 
aspartoacylase is implicated in Canavan's disease in the brain), 
but structural coverage exists for only a small fraction 
(~0.3%) of the proteins in this clan. Cglll09 was selected by 
the JCSG to increase the structural coverage of these families 
and is one of ~20 structures determined to date (see http:// 
www.topsan.org/Groups/Zinc_Peptidase). DapE is involved in 
producing L-lysine and L,L-2,6-diaminopimelate and its cata- 
lytic mechanism is likely to involve two zinc ions. 

The crystal structure of Cglll09 reveals a dimeric structure 
from crystal-packing considerations and as also suggested 
by the PDBePISA server (Fig. 2). The dimeric assembly is 
promoted by the smaller of the two domains of the molecule 
(Fig. 8), while the larger domain is the putative catalytic 
domain. The dimeric assembly is consistent with proteins from 
this family that contain a similar dimerization domain. Elec- 
tron density in 2mF a — DF C and mF a — DF C maps initially 
suggested the possible presence of two zinc ions in the puta- 
tive catalytic site, which would be expected owing to the 
addition of 6 mM ZnCl 2 during cocrystallization (which was 
added based on putative functional annotation and ligand 
screening in a fluorescence-based thermal shift assay). 
However, we did not model zinc ions in the final model owing 
to the uncertainty associated with high B factors and the 
absence of significant peaks in the anomalous difference 
Fourier maps, including from diffraction data collected at the 
zinc absorption edge. 

Fig. 8 shows a superposition of Cglll09 with the template 
used for homology modeling (PDB entry lvgy, chain A), 
a putative succinyl-diaminopimelate desuccinylase from 
N. meningitidis. The superposition shows that the overall fold 
is identical, but that there are large differences in secondary- 
structural element placement and length, as perhaps expected 
considering the low sequence identity (25%) between the 
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proteins and the resulting difficulties with molecular- 
replacement phasing. 

4. Conclusions 

Successful structure determination of the difficult molecular- 
replacement example Cglll09 illustrates the synergism 
between DEN refinement and automated model building with 
AutoBuild. DEN refinement is most beneficial at the early 
stage of the refinement process, immediately after molecular- 
replacement phasing, when the model is still relatively crude 
and distant from the true structure. For Cglll09, DEN 
refinement resulted in a model that is closer to the true 
structure, producing improved model phases that in turn 
provide a better starting point for automated model building. 
The improved model phases also provided more significant 
peaks in anomalous difference Fourier maps to better locate 
the six selenium sites of the protein. In contrast, standard 
refinement (i.e. positional and B-factor refinement) produced 
fragmented electron density with incorrect connectivity 
(marked by arrows in Fig. 5e). The R values that we obtained 
after the initial round of DEN refinement and automated 
model building with AutoBuild are better than those reported 
in Table 1 of DiMaio et al. (2011) (i? free = 0.418 versus 
Rfree = 0.460). This difference is most likely to arise from 
performing a full (y, w DEN ) grid search with multiple repeats 
with different initial velocities and random selection of DEN 
restraints at each grid point in the present work as opposed 
to a single DEN refinement as was performed previously 
(DiMaio et at, 2011). Our success in fully refining the Cglll09 
structure also demonstrates that the combination of DEN 
refinement and automated model building is a viable alter- 
native to the Rosetta molecular-replacement approach 
(DiMaio et al., 2011). However, further analysis is required to 
determine the optimal application and potential limitations of 
both methods. 




Figure 8 

Comparison of Cglll09 with lvgy-A. A superposition of the final model 
of Cglll09 (orange cartoon) and chain A of PDB entry lvgy (blue 
cartoon) is shown. The superposition was performed with PyMOL 
(DeLano, 2002). 
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Poorly fitted portions of the model after DEN refinement 
and automated model building were readily identified 
by inspection of the electron-density maps (Fig. 6). These 
electron-density maps unambiguously suggested how to 
correct the model. It turned out that most of these regions 
were related to local sequence misalignments. We generated a 
structure-based alignment between the template lvgy-A and 
Cglll09 using MUSTANG (Konagurthu et al, 2006) and 
compared it with predicted alignments. PROMALS3D and 
HHpred correctly assigned 282 and 291 positions (of a total of 
360 residues visible in the Cglll09 structure), respectively. The 
difference between the PROMALS3D and HHpred align- 
ments is caused by a one-register shift involving an cn-helix 
(residues 132-140). This one-residue shift required manual 
rebuilding when using the PROMALS3D alignment for the 
molecular-replacement search model. In retrospect, it might 
have been beneficial to use models generated by both the 
PROMALS3D and HHpred alignments as starting points for 
DEN refinement and automated model building and then to 
generate a composite model keeping the best-fitting parts of 
both models. 

Sequence-register errors that arise from local misalign- 
ments between the target protein and the homology model 
can be difficult to correct using automated model-building 
methods when working with electron-density maps at low 
resolution or those based on highly anisotropic diffraction 
data. Overinterpretation or misinterpretation of such low- 
resolution maps is a real danger when they are manually 
interpreted without assistance from more objective computa- 
tional methods. Indeed, we were able to partially automate 
the process by deleting the incorrectly aligned regions and 
rebuilding the parts with automated methods; some remaining 
regions had to be manually corrected. In particular, AutoBuild 
will sometimes misfit a-helices at low resolution, tracing the 
chain through the center of the a-helix (Fig. 5d, magenta). It 
should be noted, however, that in this case the method of 
deleting the a-helix from the current model and rebuilding it 
from scratch produced the correct fit (Fig. 5d, yellow). 
However, in two other instances this approach was not 
successful and the a-helices had to be manually rebuilt. It 
seems possible that this process could be fully automated. This 
would be especially important for low-resolution structures, in 
which interpretation of the electron-density map by inspection 
can be subjective and can lead to local misfitting (DeLaBarre 
& Brunger, 2005; Davies et al, 2008). It is conceivable that 
a systematic method to probe the fit with different local 
sequence alignments in problematic regions might produce 
the best possible model for such low-resolution structures. 
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